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ABSTRACT 



Mandates about the kind and amount of literacy instruction 
in classrooms and levels of reading required for successful promotion from 
grade to grade are currently dominating state and federal legislation. This 
paper gathers scholarship from three disparate sources to produce the first 
integrated review of policy-related research on literacy education: (1) 

policy analyses that examine policies about literacy in the framework of 
systemic reform; (2) measurement and evaluation studies, conducted by 
psychometricians, that examine the assessments mandated by policies; and (3) 
studies by literacy researchers that attend to policies and literacy-specific 
content. Each of these literatures has a unique set of questions, frameworks, 
methodologies , and audiences. Together, they provide a comprehensive 
perspective on policies and literacy education. The paper concludes that the 
best way to influence policy, instruction, and children’s learning is for 
policy, measurement, and literacy researchers to collaborate in conducting 
and reporting research. Each group needs to learn more about the others' work 
in order to affect real change in literacy practice. Contains 121 references. 
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CIRRA Inquiry 3: Policy and Profession 

How do the policies and frameworks of states guide local curriculum, 
instruction, and assessment at early grade levels? What is the impact 
of standards and assessment on literacy practice and student 
learning? 

Mandates about the kind and amount of literacy instruction in classrooms 
and levels of reading required for successful promotion from grade to grade 
are currently dominating state and federal legislation. Valencia and Wlxson 
have gathered scholarship from three disparate sources to produce the first 
integrated review of policy-related research on literacy education: (a) policy 
analyses that examine policies about literacy in the framework of systemic 
reform; (b) measurement and evaluation studies, conducted by psychometri- 
cians, that examine the assessments mandated by policies; and (c) studies 
by literacy researchers that attend to policies and literacy-specific content. 
Each of these literatures has a unique set of questions, frameworks, method- 
ologies, and audiences. Together, they provide a comprehensive perspec- 
tive on policies and literacy education. 

The best way to influence policy, instruction, and children’s learning, say 
Valencia and Wixson, is for policy, measurement, and literacy researchers to 
collaborate in conducting and reporting research. Each group needs to learn 
more about the others’ work in order to affect real change in in the lives of 
children and teachers. 
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In the past two decades, instruments of policy have reached into every 
facet of our educational lives. The “tools” of policy include everything from 
new content standards and instructional frameworks to teacher certification 
requirements, systems of assessment, Title I allocations and requirements, 
and textbook adoption guidelines. This report focuses on a discussion of 
policy-oriented research on literacy content standards and assessment. 
Understanding how policy instruments such as standards and assessments 
have attained such importance in subject matter learning provides impor- 
tant background underlying much of the research. 

Historically, state policymakers have delegated their authority over public 
education to local school districts, particularly in matters of curriculum and 
instruction. Districts, in turn, have entrusted the curriculum to teachers or 
indirectly to textbook publishers, and done little to provide instructional 
guidance (Massed, Kirst, & Hoppe, 1997). Since the publication of the now- 
famous reports Nation at Risk (National Commission on Excellence in Edu- 
cation, 1983), however, states and districts have made unprecedented forays 
into curriculum and instruction (Massed, Kirst, & Hoppe, 1997). This mod- 
em reform movement has been characterized by efforts to create new “pol- 
icy instruments” to edcit, encourage, or demand changes in teaching and 
learning, and reduce the tangles of regulation, bureaucracy, proliferating pol- 
icy, and incoherent governance that would impede reform (Smith & O’Day, 
1991). Included among the new podcy instruments are the standards and 
assessments that are the subject of the research we examine here. 

As we considered what literature to include in this review, we were aware 
that current reform efforts have contributed to an increased interest in pol- 
icy research. For example, the Office of Educational Research and Improve- 
ment (OERI) funded the Consortium for Podcy Research in Education in 
1985 and the Center for the Study of Teaching and Podcy in 1998, and the 
American Educational Research Association estabdshed Division L on pod- 
tics and podcymaking in 1997. In addition to the growing number of podcy 
researchers, researchers in the areas of measurement and evaluation have 
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become interested in policy because many reform initiatives have focused 
on assessment as a primary vehicle for improving student achievement. Sim- 
ilarly, the many reform efforts aimed at improving the literacy levels of 
young Americans have led literacy researchers to become more interested in 
research on policy-related issues. The realization that policy-oriented 
research is being conducted from a variety of perspectives led us to 
approach this review on two levels. On one level, we characterize the 
nature of policy-oriented research on literacy standards and assessments. On 
another level, we review what research tells us about the impact of literacy 
standards and assessment on practice and student learning. 

To characterize the nature of literacy policy research, we examine the litera- 
ture on standards and assessment in relation to the policy, measurement, and 
literacy contexts from which it arises. As a result, we review three fairly dis- 
tinct sets of literature. Policy researchers generally set out to address policy 
issues head-on, and are less concerned with subject-matter specifics. Mea- 
surement researchers are also more concerned with general findings than 
with subject matter specifics, but they tend to focus on the qualities and 
influence of assessment policy tools rather than on policy questions per se. 
Literacy researchers are rarely driven by policy questions or issues; they are 
primarily interested in subject matter teaching and learning. Differences in 
perspective suggest different research questions, conceptual frameworks, 
methodologies, perspectives on literacy, and audiences for publications, 
which result in differences in what is learned about literacy standards and 
assessment. 

In order to present these different perspectives and findings clearly, we have 
organized this report into four sections. The first three sections present our 
review of the policy-oriented literature related to literacy standards and 
assessment in terms of the three research perspectives— policy, measure- 
ment, and literacy. Each of these sections has two parts: a brief discussion of 
background, situating the research in its larger context, and a review of the 
literacy policy research within each perspective. The report’s fourth section 
focuses on what we have learned from these bodies of research with regard 
to the nature of the research and the results of policy-oriented research on 
literacy standards assessment. 



The Policy Perspective 



Following the publication of A Nation at Risk (National Commission on 
Excellence in Education, 1983), two “waves” of school reform emerged 
(Lusi, 1997).The first wave involved state efforts to accomplish three goals: 
(a) to raise coursework standards for high school graduation, (b) to imple- 
ment and/or expand assessment programs, and (c) to raise standards for pro- 
spective teachers (Goertz, Floden, & O’Day, 1995). The second wave of 
reform took the form of school restructuring, and combined three comple- 
mentary elements: (a) a call for higher and common expectations for ALL 
students, (b) an emphasis on new and more challenging teaching practices, 
and (c) dramatic changes in the organization and management of public 
schools (Elmore, 1990). 
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These initial waves of reform did little to change the content of instruction, 
especially with their focus on basic skills. Nor did they result in the desired 
changes in teaching, learning, and student achievement (Cuban, 1990; Fire- 
stone, Fuhrman, & Kirst, 1989). Fragmented and contradictory policies 
diverted teachers’ attention, provided little or no support for the type of 
professional learning they needed, and made it difficult to sustain the very 
promising reforms taking shape in individual schools or clusters of schools 
(Cohen & Spillane, 1992;Goertz et al., 1995). 

Growing concerns about the educational preparation of the nation’s youth 
prompted President Bush and the nation’s governors to call an education 
summit in Charlottesville, Virginia, in September, 1989. This summit resulted 
in the establishment of six broad educational goals to be reached by the year 
2000 (National Education Goals Panel, 1991). In pursuit of the National Edu- 
cation Goals, the bipartisan National Council on Education Standards and 
Testing (NCEST) issued a report in January, 1992, recommending national 
content standards and a national system of assessments based on the new 
standards. Precedent for and guidance in developing national standards was 
to be found in the work of the National Council of Teachers of Mathematics 
(NCTM), published as Curriculum and Evaluation Standards for School Math- 
ematics in 1989. The logic was that once broad agreement had been 
achieved on what is to be taught and learned, then everything else in the 
system (e.g., tests, professional development, textbooks, software, etc.) 
could be redirected toward reaching those standards. This view has come to 
be known as “systemic reform.” 

Systemic reform advocates changing teaching as the most direct way to 
change students’ learning (Cohen, 1995). and it is posited as a way to pro- 
vide top-down support for bottom-up instructional improvement in class- 
rooms, schools, and districts. The key question for reformers has been how 
to get there — how to foster (or mandate) changes in learning and teaching. 
Many systemic reformers have seen government as their chief vehicle. How- 
ever, efforts by groups such as the Coalition for Essential Schools, the Accel- 
erated Schools Network, and the New Standards Project— groups which 
have substantial resources but operate largely outside the framework of gov- 
ernmental policy — indicate that state and federal policies are not the only 
ways to pursue improved instruction. Systemic reformers have tended to 
focus on creating new policy instruments such as content standards or cur- 
ricular frameworks, assessments that are aligned with new content stan- 
dards, and changes in both preservice and inservice teacher education 
(Cohen, 1995). 

According to McLaughlin (1987), policy research into the late 1980s gener- 
ated a number of important lessons for policy, practice, and analysis by 
acknowledging the role of contextual factors such as local priorities, individ- 
ual beliefs and motivation, and the balance between support and pressure to 
change. Furthermore, McLaughlin saw these lessons as framing the concep- 
tual and instrumental challenge for the next generation of policy analysts — 
to describe a model of implementation highlighting individuals rather than 
institutions and viewing implementation issues in terms of individual actors’ 
incentives, beliefs, and capacities. Dariing-Hammond (1990) added that top- 
down policies could “constrain but not construct” change. She focused on 
policy enactment, arguing a) that local leadership and motivations for 
change are critical to policy success; b) that local agencies must adapt poli- 
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cies rather than adopting them because local ideas and circumstances 
always vary; and c) “that teachers’ and administrators’ opportunities for con- 
tinual learning, experimentation, and decision making during implementa- 
tion determine whether policies will come alive in schools or fade away 
when the money or enforcement pressures end”(p. 235). 

At a deeper level, Darling-Hammond (1990) argued that we knew litde about 
the meaning of specific policies for educational life within classrooms. She 
indicated that advances in policy analysis during the 1980s made it possible 
to ask a number of new questions. For example, what differences do such 
advances actually make to teachers’ and students’ work together? How do 
teachers understand and interpret the intentions of new policies in the con- 
text of their knowledge, beliefs, and teaching circumstances? How and 
under what circumstances do policies that are intended to change teaching 
actually do so? These observations were presented in Darling-Hammond ’s 
(1990) introduction to a special issue of Educational Evaluation and Policy 
Analysis (EEPA) focused on case studies of California reform in K-12 mathe- 
matics education. These case studies were seen as leading the way toward a 
new generation of policy analysis that recognized “the importance of under- 
standing the transformation of policy into teacher actions from the vantage 
point of the teachers, themselves, as well as from that of the policy system” 
<p. 175). 

Our review of policy research in the 1990s revealed relatively few studies 
that clearly addressed literacy standards and assessment. The few that did 
could be broken into two types. First, there were large-scale investigations 
of state reform efforts. These investigations began to link macro- and 
microlevels of analysis using classroom artifacts such as lesson assignments 
and interviews with teachers and administrators to get at the classroom per- 
spective. Second, there were investigations, often case studies, that 
explored more deeply the impact of policy instruments on teachers, 
schools, and districts. By limiting our review to policy research related to lit- 
eracy standards and assessments, we saw increased efforts to examine policy 
initiatives with what Darling-Hammond called a “pedagogical eye,” but little 
attention given to the role that different subject areas might play in imple- 
mentation (although there is some indication that this too may be changing; 
see Ball & Cohen, 1995). 



Research 



Large-scale Investigations A study by Goertz, Floden, and O’Day (1995) provides an example of how 

of state reform efforts. large-scale policy research projects in the 1990s have begun to link macro- 

and microlevels of analysis. The stated purposes for this study included 
expanding our knowledge of state approaches to education reform; examin- 
ing district, school, and teacher responses to state reform policies in a small 
number of reforming schools and districts; and studying the capacity of the 
educational system to support education reform. The findings are based on 
case studies of 12 reforming schools located in six reforming school districts 
in three states that have taken somewhat different approaches to systemic 
reform— California, Michigan, and Vermont. The researchers interviewed 
educators, administrators and policymakers at the school, district, and state 
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levels. They also surveyed and interviewed 60 teachers in each school. 
Because it was too early in the reform movement to assess the impact of any 
particular state, district, and/or school strategy, this study was intended 
more as a description of what was happening than “what works.” 

It is noteworthy that not until halfway through the seven-page Executive 
Summary do Goertz et al. mention that this research focused on reform in 
mathematics and language arts. Since we are concerned here with literacy 
related policy research, we summarize only the section of the report that 
focuses specifically on language arts policy. From a policy researchers’ per- 
spective, however, we should remember that this report is not about mathe- 
matics or language arts reform. It is about systemic reform, and the attention 
given to mathematics and language arts merely provides some specificity to 
the findings. 

The portion of the report that deals directly with language arts examines the 
degree to which teachers’ reports on their instruction were consistent with 
explicit or implicit curriculum recommendations set forth in curriculum 
frameworks and state assessments. For example, some of the aims of Califor- 
nia’s “meaning-centered” English-language arts reform were reflected in sur- 
vey data on reading instruction. Elementary teachers reported that, during 
reading instruction, their students spent three and one-half hours per week 
on comprehension strategies and responding to why they read. The least 
amount of time was spent on word recognition skills (30 minutes) and phon- 
ics (19 minutes).The California Department of Education’s English Language 
Arts Framework (1987) also emphasized a literature-based curriculum that 
“engaged students with the vitality of ideas and values greater than those of 
the marketplace or video arcade” (p. 7). Elementary teachers indicated that 
80 percent of instructional time was spent using literature trade books, and 
the remaining time was distributed among reading or subject basals, work- 
books or worksheets, or something else. 

As in California, Michigan teachers emphasized content matching the 
“meaning-centered” view of Michigan’s Essential Goals and Objectives in 
Reading (1986). For example, both elementary and secondary teachers in 
Michigan reported spending over three hours per week on comprehension 
strategies and having students respond to what they read, and slightly more 
than one-half hour per week on basic skills, such as phonics and word recog- 
nition. Both California and Michigan teachers reported spending roughly the 
same amount of time per week on reading instruction. However, California 
teachers spent over four and one-half hours per week with students engaged 
in small group reading activities, such as working in pairs or teams and small 
group discussions. In contrast, Michigan teachers spent less than two and 
one-half hours per week engaging in these kinds of activities. These differ- 
ences in instructional practices reported by Michigan and California teach- 
ers are consistent with the emphasis given to dissimilar aspects of the 
language arts reform policies in the two states. 

Goertz et al. concluded that there is evidence of general patterns that incor- 
porate new directions in both state and national reforms but also retain atten- 
tion to more traditional topic areas. Teachers believed that they had been 
influenced by state policy instruments, such as assessments and curricular 
frameworks, but that these state influences were by no means the only or 
even the most important influences on practice. Teachers reported that their 



ERLC 



9 



CIERA REPORT #3-004 



Investigations into the 
impact on teachers, 
schools, and districts. 



own knowledge and beliefs about the subject matter and about their students 
generally had a larger influence on their teaching than state policies. 

Other examples of policy research that reflects initial efforts to link macro- 
and microlevels of analysis include the work of McDonnell (McDonnell, 
1997; McDonnell & Choisser, 1997), Smith and colleagues (Smith, Noble, 
Heinecke, Seek, Parish, Cabay, Junker, Haag, Taylor, Safran, Penley, & Brad- 
shaw, 1997), and the Kentucky Institute for Education Research (Lindle, 
Petrosko, & Pankratz, 1997), all of whom studied state reforms that included 
a literacy component. For example, McDonnell and Choisser examined the 
extent to which policymakers’ expectations about the curricular effects of 
testing in Kentucky and North Carolina proved valid in local schools and 
classrooms. Their analysis was based on telephone and on-site interviews 
with teachers and administrators, and examinations of assignments and daily 
logs gathered from 23 teachers in each state. They concluded that transform- 
ing instruction through assessment was not a self-implementing reform 
because the tests alone lacked sufficient guidance for how teachers ought to 
change. 

Smith et al. took a different methodological approach, conducting a four- 
year, multimethod approach to study the effects of the now suspended Ari- 
zona Student Assessment Program (ASAP). They observed in classrooms and 
interviewed teachers and principals in four schools, and they used a survey 
approach to collect data from educators across the state.The results, which 
were generalized across subject areas, indicated that though educators knew 
about ASAP, their responses to it varied depending on how they understood 
it and how it fit with their underlying beliefs and the local conditions (mate- 
rial and knowledge resources, existing beliefs and ideologies about teaching, 
culture of accountability and authority). Smith et al. also concluded that the 
dual focus on accountability and instructional improvement, combined with 
insufficient attention to capacity building, resulted in marginal effects of the 
ASAP reform agenda. 

In contrast to the large scale investigations represented by the Goertz et al., 
McDonnell, and Smith et al. studies, some policy researchers have con- 
ducted more in-depth studies of the impact of policy instruments on teach- 
ers, schools, and districts. For example, Standerford (1997) studied two 
small districts in Michigan from 1988 to 1991. Both districts formed reading 
curriculum committees in an effort to interpret the state reading policy and 
design an official district response. To understand what happened in these 
districts, Standerford observed both the curriculum committees and the 
classroom practices of the teachers on these committees. Her results indi- 
cated that participation in the district effort was not integrated with either 
state policy or the classroom changes that individual teachers were making; 
the district rules, objectives, players, audiences, and time frames were not 
conducive to such integration. Districts’ responses to state reading reform 
were influenced by their need to reduce uncertainty, use standard operating 
procedures to effect change, advertise change by producing documents and 
plans, and respond selectively to policies based on the incentives attached. 
In contrast, the teachers made changes based on their individual profes- 
sional development activities, but were often unsure just how those changes 
fit with the state policy. 
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Standerford concluded that state and district policies had influenced the 
teachers’ efforts by making them aware that changes were expected in read- 
ing instruction, but had not made clear for the teachers what those instruc- 
tional changes should be. Nor did these policies offer much support for 
teachers’ efforts to figure that out for themselves. As teachers learned more 
about the new ideas, they gradually changed the enacted curriculum in their 
classrooms. Yet those instructional changes were minimally represented in 
the written curriculum that they produced as members of the district com- 
mittees, because their roles and objectives were defined differently at the 
district and classroom levels. 

In another series of studies, Spillane (1998), Spillane (1996), Spillane and 
Jennings (1997), and Jennings (1995) examined the impact of the reading 
policy in Michigan on a racially and economically diverse urban district, a 
relatively affluent suburban district, and a small group of teachers within the 
suburban district. Spillane’s (1996) study revealed that state and local poli- 
cies do not always support similar notions about instruction. The suburban 
district used the revision of the state reading test as a lever to move in 
another direction the central administrators who preferred a basic skills cur- 
riculum. Within a short period of time, district administrators had developed 
a new curriculum guide for reading, adopted new curricular materials, 
revised their student assessment policies, and organized an extensive profes- 
sional development program about reading that went beyond state policy. In 
contrast, the state’s reading policy did not figure prominently in the reading 
program developed in the urban district. Curriculum guides supported tradi- 
tional ideas about teaching reading by encouraging teachers to teach iso- 
lated bits of vocabulary, decoding skills, and comprehension skills. A new 
basal reading program was mandated, accompanied by a traditional work- 
book that provided students with drill in reading skills. Central administra- 
tors made no effort to revise district policy on student assessment despite 
significant revisions of the state’s reading test. 

When Spillane and Jennings (1997) looked more closely at nine second- and 
fifth-grade teachers in the suburban district, they found that the extent to 
which teachers’ practices reflected the district’s literacy initiative depended 
on how well the reforms were elaborated by the district. Their initial data 
analysis suggested significant uniformity in language arts practice among the 
nine classrooms and offered striking evidence that the district’s proposals 
for language arts reform were finding their way into practice. For example, 
they found that all nine teachers were using literature-based reading pro- 
grams and trade books, engaging in activities such as Writer’s Workshop, and 
focusing on comprehension over skills-based instruction. However, early dis- 
cussions of the observation data revealed other differences that weren’t 
being captured by the analytical framework. This led to a revised analytical 
frame focused on classroom tasks and discourse patterns that helped track 
these “below the surface” differences in pedagogy. 

Comparing results using the two analytical frameworks, Spillane and Jen- 
nings showed that it is relatively easy to arrive at very different conclusions 
about the extent to which reforms that call for more ambitious pedagogy 
have permeated practice. They argue that, if reforms are meant to help all 
students encounter language arts in a more demanding and authentic man- 
ner, then policy analysts cannot rely solely on such indicators as the materi- 
als and activities teachers use. Rather, they must sit in classrooms and figure 
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out what type of knowledge is supported by classroom tasks and discourse 
patterns. We would add that to be able to explore these issues effectively, 
one needs to understand a great deal about the subject-matter instruction 
that is the focus of the reform. Spillane and Jennings have amassed a great 
deal of knowledge about language arts and language arts instruction over 
their years of studying reform in this area, and we would argue that, without 
this knowledge, they may never have even seen the differences that led 
them to revise their analytical framework and uncover these important dif- 
ferences in classroom practice. 



Summary and Implications 



Collectively, these studies provide insight regarding both policy research 
related to literacy standards and assessments, and the impact of literacy stan- 
dards and assessments on district and teacher practices. On the one hand, 
very few policy studies provide sufficient subject matter information to war- 
rant inclusion in this review. On the other hand, several studies probe 
deeply into the details of the classroom discourse and tasks related to lan- 
guage arts instruction, revealing important differences in teachers’ imple- 
mentation of reform efforts. In terms of the impact of literacy standards and 
assessments, it is clear that policy tools such as conceptual frameworks, cur- 
riculum guides, and assessments can and do influence district and classroom 
practice. It is also evident that the relations between language arts policy 
and practice are complex and at least partly dependent on the knowledge, 
beliefs, goals, and experience of the administrators and teachers who work 
with these types of policy tools. These findings speak clearly to the need to 
understand thoroughly the context of policy implementation from both the 
system perspective and the day-to-day lives of teachers and students. They 
also suggest that, without some form of professional development, the 
effects of such policies are highly variable. 



The Measurement Perspective 



Assessment has been part of educational reform efforts for the past 40 years 
(Linn, 1998), initially serving as an indicator of reform or progress and more 
recendy serving as a lever for reform. In the 1960s, testing increased sub- 
stantially to meet the demands of evaluation and accountability for Tide I. 
Then in the 1970s and 1980s, measurement researchers became intimately 
involved in policy-related issues during the minimal competency testing 
(MCT) movement when high stakes were attached to test performance. In 
Florida, for example, where MCT graduation requirements gained a great 
deal of attention, test results revealed gains for low-achieving students but 
differential passing rates for African American, Hispanic, and white students. 
In addition, the Federal District Court decision in the landmark Debra P vs. 
Turlington (1981) case directed that students must be provided with ample 
opportunity to learn the material tested when high stakes, such as high 
school graduation, are in place. Events such as these quickly propelled 
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assessment and assessment researchers into the policy arena. Following this 
trend, a new movement, measurement-driven reform (Popham, Cruse, 
Rankin, Sandifer, & Williams, 1985), gained in popularity, emphasizing large- 
scale assessment as a “catalyst to improve instruction” (p.628). Measure- 
ment-driven reform expanded the role of assessment into the policy arena in 
two important ways: a) it focused attention on what students should learn 
(outcomes), and b) it made teaching toward the test a valued instructional 
strategy. 

Many measurement researchers explored the effects of early high-stakes 
assessments on student performance, curriculum, and teachers’ instruc- 
tional practices. In general, studies indicated that high-stakes standardized 
basic skills tests led to: a) a narrowing of the curriculum, b) an overemphasis 
on basic skills and test-like instructional methods, c) a reduction in effective 
instructional time and an increase in time for test preparation, d) inflated 
test scores, and e) pressure on teachers to improve test scores (Herman & 
Golan, 1993; Nolen, Haladyna, & Haas, 1992; Resnick & Resnick, 1992; 
Shepard, 1991; Shepard & Dougherty, 1991, Smith, 1991; Smith, Edelsky, 
Draper, Rottenberg, & Cherland, 1990). These studies led educators and the 
public alike to question the effectiveness of educational reform efforts and 
of the assessments themselves (Linn, Grau, & Sanders, 1990). As a result of 
this line of research and renewed interest in the intended and unintended 
consequences of assessment (Messick, 1989), the “alternative” or “authentic” 
assessment movement was launched. 

From past research it was clear that assessment could be a lever for reform — 
that what gets tested gets taught, and what doesn’t get tested doesn’t get 
taught. Therefore, it was reasoned that if better, more authentic assessments 
could be created to measure the “thinking curriculum” (Resnick & Resnick, 
1992), then better teaching and learning would follow. Publicly acknowl- 
edged content standards in specific subject areas would guide the content of 
the new assessments, and high performance standards, rather than norms, 
would guide goals for student achievement (NCEST, 1992). Furthermore, it 
was argued that if teachers were more involved in the development, admin- 
istration, and scoring of the assessments, there would be a greater chance 
that teaching would be enhanced. Performance assessment, portfolios, and 
projects (Resnick & Resnick, 1992) were advanced by both educators and 
measurement experts as assessment models that might foster effective teach- 
ing, learning, and measurement of worthwhile outcomes (Shepard, 1989; 
Simmons & Resnick, 1993;Wiggins, 1993). In many respects, the authentic 
assessment movement is an extension of the measurement-driven reform of 
the 1980s. Now, however, the people involved in assessment development, 
the form of assessment, and the criteria for content selection and student 
performance have changed. Furthermore, there is new interest in students’ 
opportunities to learn. 

The measurement community cautioned that the field would require new 
models for and methods of determining the technical merit of new assess- 
ments (Linn, Baker, & Dunbar, 1991; Moss, 1994), many of which were not 
yet in place. Furthermore, many argued that it was impossible to test the log- 
ical assertion that these new measures would yield more positive results 
until the assessments were in place for some time. Nevertheless, pressure 
for new, better assessments and for public accountability placed new assess- 
ments on a fast track. By 1997, 46 out of 50 states had some form of state- 
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wide assessment, and 36 of those included extended responses typical of 
performance assessments (Roeber, Bond, & Braskamp, 1997). Linn (1998) 
has suggested that policy-makers have placed enormous emphasis on assess- 
ment reform because it is relatively inexpensive and easy to mandate, can be 
implemented rapidly, and is easily reported by the press, when compared to 
the type professional development and restructuring/reculturing of schools 
required to affect deep, second-order educational change (Fullan & Miles, 
1992). So in the 1990s, we have witnessed an enormous increase in new 
assessments being used as the levers for reform. 

The research we review in this section falls into two general categories: on- 
demand forms of performance-assessments and classroom-based assess- 
ments such as portfolios. We use the term “on-demand performance assess- 
ments 1 ’ to define uniform assessments administered under controlled 
conditions; they are usually given on a particular day or days under standard 
conditions across classrooms, schools, and districts. Most statewide assess- 
ments in reading and writing are on-demand assessments. Recent efforts 
have focused on improving the quality of the assessment tasks and expand- 
ing response modes while simultaneously trying to maintain high levels of 
reliability and validity. In this section of the review, we include research on 
on-demand performance assessments that require students to demonstrate 
higher-order cognitive processes and to provide some extended responses 
to comprehension questions or to write in response to a prompt. We do not 
include research on more traditional assessments comprised only of multi- 
ple-choice items. For the second category — classroom-based assessments — 
we include assessment evidence that is systematically collected as an ongo- 
ing part of the instructional program. In some cases, the evidence is scored 
and then reported for accountability purposes either at the state or school- 
district level. Because we are focusing on policy-related research, we do not 
include research on individual classroom assessment projects. 



Research 



On-demand performance The first attempts at performance assessment in literacy can be traced back 

assessment. to the 1960s and the use of direct writing assessment instead of indirect 

measures such as multiple-choice tests (cf. Freedman, 1993). Direct writing 
assessment requires students to write in response to an assigned topic under 
timed conditions; papers are scored using a standard rubric. Many statewide 
assessments (Roeber, et al., 1997) and the National Assessment of Educa- 
tional Progress still use this approach with considerable success. Measure- 
ment researchers have focused on issues of interrater reliability and 
generalizability with respect to scoring writing samples. Interrater reliability 
is generally high, although studies indicate it can vary from .3 to .91 (Dun- 
bar, Koretz, & Hoover, 1992; Hieronymus, Hoover, Cantor, & Oberley, 1987; 
Welch, 1991). Measurement researchers seem to understand how to raise 
reliability to an acceptable level by implementing more extensive training of 
carefully selected scorers, more specific scoring guidelines, and the like 
(Mehrens, 1992; Miller & Legg, 1993). Issues of generalizability across 
modes of writing or even topics within modes are not as clear, however, and 
continue to present challenges for measurement experts (Dunbar, Koretz, & 
Hoover, 1992; Herman, 1991; Hieronymus, Hoover, Cantor, & Oberley, 
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1987). Language arts educators, however, are now raising questions regard- 
ing the authenticity of direct writing assessment and the validity of the 
results when students are required to write under these unnatural, on- 
demand conditions (e.g., constrained by time, topic, audience, and process) 
(Freedman, 1993; Lucas, 1988a, 1988b). Although these criticisms are 
appealing on the surface, Messick (1994), a noted measurement researcher, 
pointed out that concerns about both authenticity and directness need to be 
supported empirically rather than simply claimed. This is a good example of 
how differences in perspective shape the questions and the nature of the 
evidence sought. 

Although direct writing assessment is still a mainstay of many assessment 
programs, more recent efforts at performance assessment in reading and 
writing go further, including longer and more complex reading selections 
from a variety of genres, higher level comprehension questions, extended 
written responses, and cross-text analyses. The few studies from a measure- 
ment perspective that are available on new statewide assessments (e.g., 
Maryland, Kentucky, Arizona) do not distinguish among reading, language 
arts, and mathematics in design or analyses, making it difficult for literacy 
educators to interpret the implications for curriculum, instruction, or 
research. For example, in two parallel studies, researchers at RAND (Koretz, 
Barron, Mitchell, & Stecher, 1996; Koretz, Mitchell, Barron, & Keith, 1996) 
used telephone and written surveys to examine the influence of the Mary- 
land School Performance Assessment Program (MSPAP) and the Kentucky 
Instructional Results Information System (KIRIS) — both of which had assess- 
ments in several subject areas. By focusing only on responses of elementary 
teachers included in these reports, we can get some idea of language-arts- 
related results. Across both studies, teachers supported the new assess- 
ments, even in terms of encouraging reluctant teachers to change; however, 
they did not support the use of test results for accountability. On the posi- 
tive side, teachers aligned curriculum with the assessments, especially 
spending more time on writing (a dominant response mode for both assess- 
ments), although they felt that more specific curriculum frameworks would 
be helpful. On the negative side, teachers reported spending considerable 
time in test preparation activities and a tendency to de-emphasize untested 
material. Data from both studies indicated that teachers' expectations rose 
for high-achieving students rather than low-achieving students and that 
teachers credited student gains to specific test practice and test familiarity 
rather than to true improvements in capabilities. These findings led the 
researchers to call for further research on issues related to the specificity of 
the frameworks, effects on equity, inflated test scores, and the validity of the 
measures. 

One of the few studies of on-demand assessment to report specifically by 
subject area is based on data from the New Standards Project, a multistate 
effort designed to involve educators in the creation of state and district per- 
formance-based assessments in reading, writing, and mathematics (Resnick, 
Resnick, & DeStefano, 1993).This shared emphasis on new assessments and 
professional development involved teachers in the development, piloting 
and scoring of the assessments. Researchers found “moderate” interrater reli- 
ability for both the reading and writing sections of the test — too low to use 
for judging students or educational programs. More interestingly, reliability 
varied depending on the task being scored, the approach to calculating reli- 
ability (correlation or agreement), and the scoring method used (holistic or 
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a combination of analytic and holistic). Moreover, individual students’ scores 
varied depending on the scoring method used. The researchers suggested 
that more intensive training, a more selective scoring team, clearer rubrics, 
and better exemplars might improve interrater agreement. These findings 
mirror the findings discussed earlier related to direct writing assessment. 

Classroom-based The classroom-based measurement research has largely been conducted on 

assessment. portfolios. Interest in portfolios and policy stemmed from an attempt to join 

the advantages of classroom-embedded assessment with the need for large- 
scale public accountability. From the beginning, many were leery of trying 
to accomplish both purposes with one instrument, but the advantages in 
terms of teacher development, instructional practice, and student engage- 
ment motivated educators to try (Aschbacher, 1994; Haney 1991; Mehrens, 
1998; Valencia, 1991). 

The most widely studied of the large-scale portfolio projects is the Vermont 
Portfolio Assessment Program, although more measurement researchers 
have studied the mathematics portfolio than the writing portfolio (Koretz, 
McCaffrey, Klein, Bell, & Stecher, 1993; Koretz, Stecher, & Deibert, 1992; 
Koretz, Stecher, Klein, & McCaffrey, 1994; Koretz, Stecher, Klein, McCaffrey, 
& Deibert, 1993). Because statewide assessment was new in Vermont, this 
project was conceptualized as a system that would take hold gradually — it 
would be decentralized and “a very long effort” (Mills & Brewer, 1988, as 
cited in Koretz et al., 1994). According to state officials, it was designed to 
support sound educational practice, encourage professional development of 
education, encourage local autonomy, and provide comparable information 
across schools. The writing assessment was designed to be administered in 
grades 4 and 8 (in 1994-95, the writing assessment was moved to grades 5 
and 8) and is comprised of two main components in writing: a) a portfolio 
of student work which includes a set number and specified types of pieces 
of writing collected over the course of a year, and b) a “uniform test” of writ- 
ing (i.e. a standard prompt to which all students respond using standard pro- 
cedures). The portfolio contents and the Uniform Test are scored by a wide 
range of Vermont teachers other than the students’ own, using an analytic 
scoring rubric. 

Studies of reliability indicated interrater correlations ranging from .46 to .63 
(45 percent agreement based on exact match) depending on how the scores 
were aggregated (e.g., within or across scoring dimensions; by individual 
piece or across sections of the portfolio). This finding led researchers to con- 
clude that the state could not report on the percentage of students scoring 
at each point on each of the writing traits, nor could it provide comparative 
data across districts and schools (Koretz, Stecher, Klein, McCaffrey, & Deib- 
ert, 1993). Researchers suggested that inadequate rubrics, insufficient train- 
ing of scorers, and lack of standardization of portfolio tasks most likely 
contributed to the lack of reliability. 

In terms of validity, the Vermont results were “not persuasive” (Koretz, 
Stecher, Klein, & McCaffrey, 1994). The correlation between the portfolio 
scores and the Uniform Writing assessment were moderate, as one might 
expect from other research; however, these same levels of correlations were 
found between writing portfolios and a multiple choice math test. In addi- 
tion, they found little difference between scores on papers students selected 
as “best pieces” and scores for the rest of the writing portfolio, a finding that 
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Is inconsistent with other evidence suggesting lack of generalizability across 
different writing tasks (Dunbar, Koretz, Hoover, 1991). Validity was also 
brought into question in terms of portfolio implementation. In keeping with 
local autonomy, researchers found great variability in teachers’ implementa- 
tion of portfolios, resulting in a wide range of types of work included in the 
portfolios as well as a wide range of teacher support for the work, all of 
which would raise validity questions. Principals reported that, although the 
assessment system placed sizable demands on schools for resources — espe- 
cially in the area of time and support— they thought it was a worthwhile 
burden (Koretz et al. 1993). Because they felt the burden fell primarily on 
the teachers, the majority of principals provided release time to help ease 
the stress. 

In contrast to statewide Initiatives on portfolios, several school districts have 
tried to implement literacy portfolios with the dual focus of accountability 
and improvement of instruction. Measurement researchers have studied 
both the ARTS PROPEL middle/high school writing portfolios in Pittsburgh 
and early literacy portfolios in Rochester, New York. Portfolios from Pitts- 
burgh Public Schools (LeMahieu, Eresh, & Wallace, 1992) grew out of the 
ARTS PROPEL project, a privately-funded project to design instruction-based 
assessment in visual arts, music and imaginative writing. The writing portfo- 
lios were complied by students in grades 6-12 from a folder of their class- 
room writing. Using a set of guidelines, students selected four pieces 
(including drafts as well as final copies) and provided several written reflec- 
tions on their writing processes, rationale for their selections, and the crite- 
ria they used for judging their work. As a result, there was less required 
commonality across portfolios than in the Vermont Portfolio Assessment. 
Portfolios were scored by a small group of highly trained district teachers 
and administrators using a rubric that reflected a decade-long district-wide 
history of professional development in writing. Judges assigned identical 
scores for 45 to 56% of the portfolios. Interrater correlations ranged from .80 
to .84 across three scoring dimensions (accomplishment, processes/strate- 
gies, and growth). In addition, researchers found that portfolio scores were 
highly related to the classroom writing opportunities students were 
afforded. Students in classrooms judged to have teachers with an “intense” 
writing practice scored significantly better than those in classrooms judged 
to be moderately or not at all intense. Interestingly, portfolio scores were 
more strongly correlated with a standardized reading test than with a stan- 
dardized direct writing measure. 

The Rochester portfolios grew out of curriculum reform Initiated three years 
before portfolios were sequentially implemented in the primary grades. The 
portfolios were designed by teachers to be scored by classroom teachers 
rather than by external scorers. They include both required pieces (e.g., 
writing samples, letter-sound assessments, observations, and anecdotal 
records) and optional pieces for reading and writing collected on a regular 
schedule. Teachers scored the portfolios and then assigned each child to a 
developmental stage specified in a “rubric.” Supovitz, MacGowan, and Slat- 
tery (1997) compared the ratings given by Rochester classroom teachers 
and outside evaluators. They found interrater correlations between .58 and 
.77 with more consistency in reliability for writing than for reading. They 
found that external reviewers had difficulty scoring “thin” evidence found 
for reading both because few reading pieces were required in the portfolios 
and because teachers rarely included the required (or any additional) read- 
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ing evidence. When reading evidence was included, they found that outside 
raters had difficulty judging the work and applying judgments to develop- 
mental levels. Their findings also suggested that classroom teachers scored 
students significantly higher than outside raters in the area of reading, where 
the lack of portfolio evidence was most likely supplemented by teachers’ 
knowledge. There were no significant differences across scorers in writing. 
In a second study, Supovitz and Brennan (1997) found that gender, socioeco- 
nomic, and racial inequities existed when portfolio performance was com- 
pared to standardized test performance, although the Rochester portfolios 
closed the gaps between blacks and whites and widened the gaps between 
boys and girls. 

Questions about the variability in portfolio contents across students have 
been raised with respect to the influence on reliability, but the issue is perti- 
nent with respect to validity as well. If students receive different levels of 
support or if evidence is simply not available, then judgments about stu- 
dents’ abilities will be open to question.Two studies shed light on this point. 
In one, Herman, Gearhart, and Baker (1993) were able to get satisfactory lev- 
els of interrater agreement for portfolios containing only narrative and sum- 
mary writing, but they discovered that students’ scores were substantially 
different across different contexts (standard writing prompt vs. portfolio 
work; analytic vs. holistic rubrics; scoring of individual pieces vs. the total 
portfolio; narrative vs. summaries). In fact, two-thirds of the students classi- 
fied as competent using portfolios scores were not judged competent on the 
standard writing assessment.This led the researchers to question the validity 
of portfolio scores and to look further behind the actual work. So, in another 
study, Gearhart, Herman, Baker, & Whittaker (1993) asked, “Whose work is 
it?” that is contained in students’ writing portfolio. Teachers were asked to 
rate the level of instructional support for writing assignments in students’ 
portfolios (grades l-6).They found variability in the amount of support that 
teachers provided to students, the time students spent on assignments, and 
the extent to which work was copied. Furthermore, students received differ- 
ent levels of support depending upon whether they were low- or high- 
achieving, and teachers with more portfolio experience provided more 
teacher support. Not only was student work influenced by the level of 
teacher support, but this support was provided differentially across students 
and classrooms. 

In an effort to look more closely at classroom-embedded performance assess- 
ment, Shepard and her colleagues (Shepard, Flexer, Hiebert, Marion, May- 
field, & Weston, 1996) examined the effects of a professional development 
project which was designed to help teachers use performance assessments 
as part of regular instruction in reading and mathematic s.They reasoned that 
embedded performance assessments would improve learning by introduc- 
ing challenging tasks that were consistent with curricular goals and by help- 
ing teachers clarify their understanding of their students, thereby informing 
their instruction. This study represents a shift from the other studies in this 
section in two important ways: (a) It integrates expertise in subject matter, 
teacher change and assessment in the design, implementation and analysis; 
and (b) it integrates professional development with a study of new assess- 
ments and student learning. Although the authors reported no gains in stu- 
dent learning in reading on the outcome measure (Maryland School 
Performance Assessment Program), they offered explanations that are con- 
sistent with other studies in both the policy and literacy sections. Specifi- 
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cally, they found that, although teachers were familiar with the district 
curricular framework before the project began, their motivation and instruc- 
tional practices were not congruent with the frame work. What the research- 
ers conceived of as a professional project to introduce classroom 
performance assessment evolved into more of a project on literacy instruc- 
tion and assessment. They concluded that performance assessments alone 
are not enough to improve teaching; long term professional development is 
also needed. 



Summary and Implications 



Measurement researchers tend to focus on the feasibility of new assessments 
from a technical perspective (reliability and validity) and on their desirability 
(consequential validity), often relying on statistical procedures and self- 
reports. Like policy researchers, measurement researchers generally have 
not distinguished among different subject areas in their targets for study or 
in their conclusions and recommendations, even though Linn (1998) (a 
prominent measurement researcher) has found differences in student perfor- 
mance across subject areas and within subscales of the same subject area. 
Overall, the measurement research on literacy assessment reform suggests 
that there is still uncertainly about the ability of performance assessments to 
provide reliable and valid data for accountability. As we might expect, con- 
textual factors such as the nature of the classroom instruction and support 
provided to students are difficult to control across classrooms. Another 
source of difficulty is the degree of teacher involvement and teacher choice 
in the assessment. Several studies suggest that more standardization in the 
assessment artifacts, more expertise in the scorers, or more specificity in the 
outcomes could address problems of reliability and validity. However, these 
suggestions fly in the face of an important rationale for new standards and 
assessment— the professional development of teachers that is fostered 
through their involvement in the development and scoring of new assess- 
ments. As for the issue of consequential validity, data indicate that new 
assessments and standards have some influence on teachers’ practices but 
that the accountability factor creates stress for teachers and raises questions 
about how well they implement new assessments. Whether we look at 
issues of feasibility or desirability of new assessments, the studies in this sec- 
tion highlight the tension between assessment for accountability and assess- 
ment for instructional improvement. They also raise questions about the 
quality and the influence of assessment as a policy tool (see Linn, 1998 and 
Mehrens, 1998 for more in-depth measurement perspectives across subject 
areas). 



The Literacy Perspective 



Literacy has been a centerpiece in efforts to push ambitious reforms in 
teaching and learning. National reports such as A Nation at Risk noted the 
failure of schools to provide the nation with a more literate populace as evi- 
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denced by allegedly declining verbal SAT scores and less than encouraging 
results of National Assessment of Educational Progress (NAEP) reading 
assessments. At the same time, the research community expressed concern 
about the skills-based conceptualizations that were guiding assessment and 
instruction in reading (see Curtis & Glaser, 1983; Guthrie & Kirsch, 1984; 
Linn, 1986); they were less concerned with writing because, as we noted 
previously, writing process and direct assessment of writing had gained pop- 
ularity in the 1970s.These concerns led the National Academy of Education’s 
Commission on Reading to issue the report entitled Becoming a Nation of 
Readers (BNR) (Anderson, Hiebert, Scott, & Wilkinson, 1985). The essence 
of BNR was that reading is a holistic, constructive process rather than the 
aggregate of a series of isolated subskills, and that curriculum, instruction, 
and assessment should reflect this view of reading. In many respects, BNR 
represented a subject-matter specific version of the national reports calling 
for more attention to higher-order thinking, and it became a conceptual 
framework for literacy researchers who were becoming involved in local, 
state, and national policy initiatives. 

A constructivist view of reading was also evident in a number of state efforts 
to develop curriculum frameworks, objectives, and assessments in reading 
and language arts. For example, in 1984, Michigan put forward a “new” defi- 
nition of reading as “the process of constructing meaning through the 
dynamic interaction among the reader, the text, and the context of the read- 
ing situation” (Wixson & Peters, 1984). This definition then served as the 
basis for new state Essential Goals and Objectives in Reading (1986). Given 
that BNR was written in Illinois, and Michigan was promoting a similar con- 
ceptualization of reading through its new definition, it is not surprising that 
these two states led the way in developing statewide reading assessments 
that better reflected constructivist reading theory and knowledge (Valencia, 
Pearson, Peters, & Wixson, 1989; Wixson, Peters, Weber, & Roeber, 1987). 

Literacy researchers and state curriculum specialists worked with measure- 
ment specialists in Michigan and Illinois to develop a new generation of 
reading assessments consistent with new views of reading and new student 
outcomes (Peters, Wixson, Valencia, & Pearson, 1993). Although there was a 
precedent for this type of collaboration in the development of the NAEP 
tests, NAEP had little impact on the development of curriculum, instruction, 
and assessment at state, district, and school levels because it had been 
designed to provide information only at the national level. Rather, it was the 
large-scale reform efforts of the 1980s described previously in this chapter 
that brought together literacy researchers and curriculum and measurement 
specialists to effect the types of changes being called for by the research 
community, policymakers, and the public at large. 

The constructivist perspective on reading being promoted in various 
national and state policy documents also influenced the development of a 
new Reading Framework for NAEP reading tests. The 1992 Reading Frame- 
work, which was also used in 1994 and 1998, indicates that “Reading for 
meaning involves a dynamic, complex interaction among three elements: 
the reader, the text, and the context” (NAGB, n.d., p. 10). Although NAEP 
continued to use direct writing assessment as it had done in the past, it 
began to include some open-ended reading items to address the new defini- 
tion of reading and recommendations for new forms of reading assessment. 
The first year this framework was in effect was also the first year of the vol- 
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untary, trial program to administer NAEP in a way that allowed for state-by- 
state comparisons, and the year in which the NAEP special study on oral 
reading fluency was conducted (Pinnell, Pikulski,Wixson, Campbell, Gough, 
& Beatty, 1995). Both of these innovations came in response to demands by 
many that NAEP respond to the need for higher levels of accountability in 
general and specifically in the area of reading. 

With the publication of the California English-Language Arts Framework in 
1987, the treatment of reading and writing as separate subject areas began 
to give way to the idea of integrated language arts consisting of listening, 
speaking, reading, and writing. “Language arts became a discipline con- 
cerned with major universal themes, the human condition, exploring life 
experiences, and social agendas introduced through quality literature” 
(Gonzales & Grubb, 1997 p. 696). The California Framework pushed the def- 
inition of reading and language arts beyond a purely constructivist perspec- 
tive to a more social-constructivist perspective with its emphasis on 
“transactions” as opposed to interactions with text and considerations of the 
sociocultural experiences students bring to text. 

Prominent among the policy initiatives adopted by the California State 
Department of Education to support the framework was the California 
Learning Assessment System (CLAS). The CLAS reading assessment was 
designed to evaluate the success of the language arts curriculum, and to 
help districts and schools understand how well students were internalizing 
the strategies that encourage them to construct understandings beyond the 
school setting. CLAS took seriously the call for integration of the language 
arts by including open-ended written responses to reading selections, hav- 
ing students work collaboratively on some sections of the assessment, and 
tying some of the direct writing prompts to reading selections (Weiss, 
1994). 

In 1992, when the new CLAS assessments were being implemented, the 
contract to develop national English language arts standards was awarded by 
the U.S. Department of Education (DOE) to the Center for the Study of Read- 
ing at the University of Illinois, in collaboration with the International Read- 
ing Association (IRA) and the National Council of Teachers of English 
(NCTE). Before these standards were completed, however, the contract was 
terminated by the U.S. DOE for lack of satisfactory progress, reflecting differ- 
ences in perceptions about what constitutes appropriate standards in 
English language arts. The project continued under the auspices of IRA and 
NCTE and concluded with the publication of the Standards for English Lan- 
guage Arts in 1996. 

Consistent with the California Framework, the NCTE/IRA standards defined 
English language arts as listening, speaking, reading, writing, viewing, and 
representing. By the time the NCTE/IRA standards were published in 1996, 
there was widespread concern about the direction that constructivist and 
sociocultural views of teaching and learning were taking curriculum, 
instruction, and assessment in all subject areas including reading. In 1994, 
yielding to pressure from conservative groups, Governor Wilson vetoed leg- 
islation to continue funding for CLAS (Gonzales & Grubb, 1997), and the 
results of the 1992 and 1994 NAEP state-by-state comparisons placed Califor- 
nia close to the bottom of the rankings in reading (Campbell, Donahue, 
Reese, & Phillips, 1996; Mullis, Campbell, & Farstrup, 1993). The “whole lan- 
guage” California framework was blamed for the failure of many California 
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children to learn to read, fueling a nationwide resurgence of the phonics vs. 
nonphonics “reading wars” which have surfaced every 10 or 20 years since 
the turn of the century (e.g. Chall, 1967; Flesch, 1955). 

Most recently, there has been a shift away from attention on comprehension, 
writing, and integrated language arts to early reading, especially phonemic 
awareness and phonics. In 1998, some policymakers and educators pro- 
moted a return to skills-based definitions of reading that emphasized decod- 
ing as the only or primary concern of reading instruction. At the state level, 
there was a virtual firestorm of legislation focused on early reading that 
included new curriculum frameworks and standards, assessment mandates, 
textbook adoption guidelines, and mandates for teacher credentialing and 
professional development. For the first time since its inception in the mid- 
1970s, the U.S. DOE funded the national Center for the Improvement of 
Early Reading Achievement (CIERA) in 1997. At the same time, the National 
Research Council of the National Academy of Science commissioned a blue 
ribbon panel report on Preventing Reading Difficulties (National Research 
Council, 1998) and the National Institute for Child Health Development 
(NICHD) impaneled a group of experts to point educators and policymakers 
to the best research in reading instruction. With these recent events, 
national involvement in standards, assessment, and instructional strategies 
gained momentum. 

The policy-oriented research we review in this section focuses on three 
areas related to literacy standards and assessments: on-demand reading and 
writing assessments, classroom-based assessments such as portfolios, and 
statewide language arts content standards. What distinguishes policy- 
oriented research conducted by literacy researchers from that conducted by 
policy and measurement researchers is a primary emphasis on the subject- 
matter content of standards and assessment and their validity and consis- 
tency with current literacy theory and research. Literacy researchers also 
focus on how policies shape classroom literacy practices, frequently gather- 
ing direct evidence of teaching and student learning as well as self-reported 
responses to policy. Literacy researchers tend to be less interested in the 
overall, or more systemic, effects of policy and reform. 



Several literacy researchers have explored the influence of on-demand liter- 
acy assessments. In a series of studies, literacy researchers at the National 
Reading Research Center (NRRC) (Afflerbach, Almasi, Guthrie, & Schafer, 
1996; Almasi, Afflerbach, Guthrie, & Schafer, 1995; Guthrie, Schafer, 
Afflerbach, & Almasi, 1994) investigated the effects of the Maryland State 
Performance Assessment Program (MSPAP), a multipronged reform effort 
that includes learning outcomes, a performance assessment, guidelines for 
school decision-making, and suggestions for staff development. Using semi- 
structured interviews, similar to those used by measurement researchers, 
they found that, one year after implementation, there was some limited 
understanding of the Maryland learning outcomes among country/district 
language arts administrators but no widespread consensus on the reading/ 
language arts outcomes included in the MSPAP. Nevertheless, the administra- 
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tors believed that the performance assessment was moderately aligned with 
their local curriculum. 

Although there was little to no reported change in school governance or 
teacher decision-making, administrators did report some change in instruc- 
tional practices, including integrating reading and writing within content 
areas and the use of more trade books. In schools nominated as implement- 
ing positive innovations in response to the MSPAP, teachers and administra- 
tors reported changes in instructional tasks, methods, materials, and 
learning environments which reflected the nature of the MSPAP and learner 
outcomes in literacy. They also reported administrative support for change, 
including professional development, and a positive influence on students’ 
motivation for reading and writing. The researchers identified several barri- 
ers to implementation even in the schools that were identified as successful 
implementors: lack of alignment between classroom instruction and assess- 
ment and the MSPAP assessment; insufficient resources such as time and 
money for professional development; testing logistics; and communication 
between the state and schools about the rationale and nature of the assess- 
ment program. They suggested that better communication and support is 
needed between state and local school districts if implementation is to be 
effective. Change, they argued, requires more than development of assess- 
ment materials and procedures. 

Other researchers have examined more directly the relationship between 
new statewide writing assessments and classroom instruction. Two studies 
highlight the difficulties in achieving effective reciprocity between instruc- 
tional practice and assessment. Goldberg, Roswell and Michaels (1995/ 
1996) examined whether the MSPAP which required students to engage in 
the writing process (including drafting, peer response, revision, and writing 
a final draft), produced improved performance in writing. Specifically, 
researchers were interested in the extent to which students engaged in 
effective peer response, revision, and final drafts during testing. Using 
results from the MSPAP and observations from test taking in grades 3,5, and 
8, they found that students did not use revision or peer response to improve 
their final writing; their changes were minimal and focused on surface-level 
features, and their peer responses were unengaged. Goldberg et al. sug- 
gested that the constraints of large-scale assessment (i.e., assigned topics, 
limited and prescribed time blocks, use of revision and response work- 
sheets, collaboration with assigned partners rather than classmates) may 
inhibit students’ motivation and ability to engage in revision and peer 
response. They concluded that testing situations may not be able to mirror 
some aspects of good instructional practice. 

Similarly, Loofbourrow (1994) conducted a case study of how two eighth- 
grade teachers interpreted and enacted the California Assessment Program 
(CAP) in writing into their classroom instruction. This study was conducted 
at a time when CAP was a high-stakes assessment and when it was not 
aligned with California curriculum guidelines for teaching writing. She 
found that, when there was misalignment between a high-stakes test and 
statewide recommendations for curriculum and instruction, teachers 
attended more to the form and content of the assessment. In this case, 
although middle-school teachers had students write across a wider variety of 
genres (an emphasis of both CAP and instructional recommendations), most 
of their writing assignments mirrored the test-like setting of CAP (e.g. lim- 
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ited time, one- to twopage writing assignments, teacher-assigned topics, 
focus on one of the eight CAP modes, emphasis on form over function). 
Many sound curricular and instructional recommendations were put aside as 
teachers attended to the specific form and content of the assessment. 

Allington and his colleagues have taken a different approach to on-demand 
assessments, exploring the effects of on-demand assessment policies on spe- 
cial needs students and on the system as a whole. In several studies, Alling- 
ton and McGill-Franzen (1992a, 1992b, McGill-Franzen & Allington, 1993) 
highlighted the changes in the incidence of retention, remediation, and 
identification of students as handicapped across a 10-year period when New 
York State increased high-stakes assessment and accountability. An increas- 
ing proportion of elementary children were retained or identified as handi- 
capped in grades K-2, the grades that preceded the grade-3, high-stakes 
reading assessment. There was no corresponding trend for remediation. The 
researchers suggested that, although it was unlikely that the reform was 
intended to increase numbers of children retained or placed in special edu- 
cation, the net effect was that these low-achieving students were removed 
or delayed from the accountability stream. As a result, scores at the targeted 
grades were likely to rise without improved learning — the sample of stu- 
dents tested was simply limited. In fact, these researchers found that across 
all grades, schools that had been historically low-performing but seemed to 
be improving since the implementation of high-stakes assessment, had three 
times the number of students identified for special education or retained as 
compared with historically high-performing schools. Retention and identifi- 
cation, they argued, are expensive and ineffective ways to produce real 
gains. 

Classroom-based Most studies of classroom-based assessment literacy have investigated the 

assessment. effects of portfolio assessment; some have focused on statewide policies, 

and others on district-wide efforts. Two interesting lines of research have 
come from literacy researchers who have examined effects of statewide 
writing portfolio assessments in Vermont and Kentucky. Studies of the Ver- 
mont writing portfolio (Lipson, 1997, Lipson & Mosenthal, 1997; Mosenthal, 
Lipson, Mekkelsen, Daniels, & Jiron, 1996; Mosenthal, Mekkelsen, & Jiron, 
1997) examined teachers’ perspectives on the influence of the portfolio 
mandate and how they used the portfolios in classroom instruction and 
assessment. Unlike the research on Vermont portfolios cited in the previous 
assessment section, this work focused specifically on the writing portfolio 
and used a combination of surveys, interviews, and in-depth case studies 
and observations of 12 teachers. In addition, these researchers analyzed 
their findings in terms of teachers’ different theoretical perspectives and 
beliefs about writing instruction. 

Surveys were administered to fifth-grade teachers before and after the first 
year of implementation.The majority of teachers reported that their writing 
instruction had improved; they incorporated more writing into their class- 
rooms and, once more writing was in place, they used the portfolio scoring 
criteria as part of their instructional talk with students. Although teachers 
embraced portfolios for instructional purposes, they did not seem to use 
portfolios or the criteria for assessing writing in the classroom, which sug- 
gested that they may not have a “shared standard” for student performance. 
Furthermore, teachers were strongly opposed to the scoring and public 
reporting of results. 
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Most interestingly, surveys and observations revealed that teachers with dif- 
ferent beliefs about teaching writing changed in different ways. Those who 
already emphasized writing processes in their instruction felt most posi- 
tively about the state assessment but changed very litde because they had lit- 
tle need to change. In contrast, those teachers who were more dependent 
on curriculum and less child-centered used portfolios for organizing writing, 
but did not integrate them into their teaching. Finally, those teachers who 
paid litde attention to writing processes or portfolios were most negative 
and changed their practices very little. 

The Kentucky writing portfolios have also been studied by literacy research- 
ers. Bridge, Compton-Hall and Cantrell (1997) replicated a 1982 study to 
determine changes in the amount and type of writing elementary students 
were engaged in and the nature of writing instruction provided by teachers. 
They studied changes in one school district using written surveys of more 
than 200 teachers and classroom observations of teachers’ instruction and 
writing activities of targeted students in 12 classrooms. Across both observa- 
tions and surveys, they found a twofold increase in the amount of time stu- 
dents spent engaged in writing as compared to 1982; the biggest increase 
occurred at grade l.This finding confirms results of other studies of reform 
in Kentucky (Bridge, 1994;Raths & Fanning, 1993). 

Bridge et al. also looked closely at quality of the writing.They found a sizable 
increase in the amount of time spent on higher-level writing tasks such as 
crafting and revising, and a decrease in the time students spent filling in 
words on worksheets or copying, which was dominant in 1982. In addition, 
teachers reported major changes in the way they responded to students’ 
writing, shifting to greater use of teacher and group conferences and a 
decrease in assigning grades to students’ writing. Teachers reported that, in 
large part, changes in their writing instruction could be attributed to the 
Kentucky assessments, although the authors acknowledged that most teach- 
ers were more knowledgeable about the writing process in 1995 than in 
1982. Although more than 50 percent of teachers reported substantive 
changes in their writing instruction since the Kentucky Education Reform 
Act (KERA), about one-third of them reported little change because their 
instruction was already in line with the new assessments. Like the Vermont 
studies, this study highlights the increase in amount of writing and the dif- 
ferential impact of policy on teachers whose instruction is more or less 
aligned with new assessments. 

A slightly different perspective on the Kentucky reform comes from a case 
study of nine high school English teachers faced with implementing the 
state portfolios in the second year of the mandate (Callahan, 1997). This 
study found that teachers had not yet received much professional develop- 
ment regarding the assessment and that they viewed the portfolio as a “test 
of their competence.” Consequently, although the assessment did change 
the amount and kind of writing students did to fit with portfolio require- 
ments and prompted teachers to internalize and use scoring criteria during 
instruction, teachers put their energy into “the visible, procedural elements 
of the assessment” rather than integrating it into their instruction. It 
remained a separate and intimidating burden to them. This lack of attention 
to professional development is also reported in studies by Gooden (1996) 
and by Miller, Hayes, and Atkinson (1997).Their work suggests that, in some 
states, having a high stakes assessment in place was assumed to be sufficient 
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to promote teacher change or to encourage local districts to provide sup- 
port for change. Unfortunately, this did not happen. 

Several literacy researchers have focused on classroom-based assessment at 
the district level (Hoffman, Worthy, Roser, McKool, Rutherford, 8c Strecker, 
1996; Salinger & Chittenden, 1994;Valencia 8c Au, 1997). In all three of these 
studies, the researchers were interested in whether assessments could serve 
the dual purpose of improving instruction and providing accountability 
information. In all cases, the researchers worked directly with teachers and 
school districts in ongoing professional development activities focused on 
literacy curriculum and instruction as well as assessment implementation. 
Two studies focused early literacy assessment at the district level — the South 
Brunswick Early literacy Portfolio (Salinger 8c Chittenden, 1994) and the Pri- 
mary Assessment of Language Arts and Mathematics (PALM) (Hoffman, Wor- 
thy, Roser, McKool, Rutherford, 8c Strecker, 1996). The South Brunswick 
portfolio included specific content aimed at early literacy (K-2), and specific 
procedures and timelines for data collection. Using a developmental scale, 
teachers rated students on one component— strategies for making sense of 
and with print. The PALM model was somewhat different in that it combined 
three assessment elements: classroom-embedded assessments, a week-long 
on-demand assessment, and “taking a close look” assessments, which teach- 
ers used to gather additional information on particular students. The on- 
demand assessment and a developmental profile based on classroom-embed- 
ded and “take a closer look” information were scored. Using a combination 
of artifacts, interviews, and documents as data sources, Hoffman et al. and 
Salinger and Chittenden combined qualitative and quantitative analyses. In 
both studies, teachers reported that they could use assessment information 
for instructional purposes and that using these assessments was consistent 
with and enhanced their practice. Although teachers at both sites struggled 
with management and time issues, they all felt that the results justified the 
effort. Teachers viewed participation in professional development as critical 
to their success. Student data from both sites were able to be reliably scored, 
making the assessments useful for accountability purposes at a district level. 
In addition, statistical analysis of the PALM data revealed that all three com- 
ponents of PALM (classroom-embedded, on-<iemand, and taking a closer 
look) contributed significantly to the prediction of students’ scores on a 
norm-referenced reading test. 

The third example in this group of district-level efforts involved a cross-dis- 
trict study (Valencia &Au, 1997; Au & Valencia, 1997). This approach is dif- 
ferent from others in that it addressed the question of whether common but 
not identical curriculum standards and portfolio structures could produce 
effective cross-site analysis and cross-site teacher learning. In addition, it 
examined the contextual factors that influenced portfolio implementation 
and used a portfolio model in which students chose a substantial number of 
pieces. Valencia and Au found that classroom portfolios contained artifacts 
consistent with the constructivist literacy curriculum frameworks at both 
sites. Although teachers were expected to include several required or “on- 
demand” pieces, some portfolios were missing needed evidence. This was a 
result of different emphases in different classrooms and the difficulty teach- 
ers had documenting particular aspects of reading. Yet once teachers were 
aware of what was missing, they were confident they could include it. 
Teachers reached a high level of agreement when rating portfolios from 
both sites, and they enhanced their knowledge of teaching, learning, and 
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assessment through the scoring process. Valencia and Au suggested that 
these results were a function of a supportive system for implementation 
which included district support, low-stakes, long-term professional develop- 
ment focused on assessment and instruction, and gradual implementation 
with an emphasis first on curriculum and instruction. They also suggested 
that the combination of required and optional pieces and the scoring pro- 
cess encouraged the flexibility of implementation and specificity of perfor- 
mance standards needed for portfolios to address both accountability and 
improved instruction. 

Stephens, Pearson, Gilrane, Roe, Stallman, Shelton, Weinzierl, Rodriguez and 
Commeyras (1995) also addressed contextual influences in their study of the 
relationship between assessment and instruction. Using in-depth case stud- 
ies of elementary schools in four school districts, they examined how deci- 
sions were made and how that process influenced the relationship between 
assessment and instruction. Qualitative analysis revealed that the relation- 
ship was not straightforward; the unique decision-making model in each dis- 
trict influenced the relationship. When the teachers had little authority or 
power over instructional decision-making, or when administrators were 
controlled by district staff, an “assessment-as-test” mentality drove instruc- 
tion. In other words, when responsibility and accountability were to exter- 
nal forces, tests did drive instruction, and not necessarily in positive ways. 
When the culture of the district was one of responsibility to individual learn- 
ers and decisions were based on individual or collective perspectives of 
teachers, assessment as test did not appear to drive instruction. Stephens et 
al. raise the question of whether reform aimed at teacher empowerment can 
coexist with external accountability when school culture exerts such a 
strong influence on teachers’ practice. 

Language arts standards. Few literacy researchers have conducted research aimed directly at either 

— — state or national English language arts and, as we have noted previously, pol- 

icy researchers interested in standards often analyze them in terms of larger 
reforms efforts, without specific regard for subject area. Three types of stud- 
ies characterize the nature of standards research from a literacy perspective: 
document analysis, study of teachers’ practices, and the alignment of stan- 
dards with assessment. 

Wixson and Dutro (in press) conducted a document analysis of 42 state stan- 
dards in early reading/language arts as a way to gauge how the variability in 
standards might influence their translation into local curriculum, instruction 
and assessment. They found that the majority of state documents did not 
provide specific benchmarks or outcomes at grades K-3, that the docu- 
ments varied in the way they conceptualized and organized the area of read- 
ing, and that many of them included inappropriate content and/or ignored 
important content.When documents did provide benchmarks, many did not 
provide a logical developmental progression across grades, and many of the 
benchmarks themselves were overly specific or overly broad. In the former 
case, Wixson and Dutro concluded, districts are provided insufficient guid- 
ance; in the latter, the curriculum becomes prescriptive without much flexi- 
bility for local interpretation. They recommend the need for balance 
between specificity and generality if standards are to help local educators 
engage in conversations needed to advance teaching and learning. 
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In a different approach to examining standards, McGill-Franzen and Ward 
(1997) first reviewed documents to determine the fit between New York 
State Language Arts and Social Studies frameworks with national standards. 
They then conducted case study interviews with K-4 teachers in four dis- 
tricts to determine how state standards were incorporated into teachers’ 
practices.AU the participants in their case studies had been involved in some 
sort of school-wide language arts curriculum development projects aimed at 
helping teachers reconceptualize teaching. Consistent with Wixson and 
Dutro’s (in press) recommendations, they found New York State standards 
reflected the national standards in orientation to reading process and learn- 
ing, and actually went beyond national standards to provide a level of speci- 
ficity that helps teachers know what students should know and be able to 
do at different developmental levels. However, they also found that teachers 
interpreted the state standards differentially. If teachers were under pressure 
to improve scores on tests (which were not aligned with the standards), and 
worked under conditions that restricted their authority and responsibility 
for instructional decision-making, they were less likely to reconceptualize 
curriculum and evaluation in their schools. Since this study, New York State 
has restructured their assessments to align with their curriculum. We do not 
yet have data to know whether the results of McGill-Franzen & Ward (1997) 
will be replicated with the new assessments. 

The last approach to research on standards is found in a study by Bruce, 
Osborn, and Commeyras (1993) in which they examined the alignment 
between NAEP reading assessment items and the NAEP reading framework 
(standards). Using data from interviews, expert panels, and surveys of hun- 
dreds of literacy educators, Bruce et al. concluded that although most liter- 
acy experts agreed that the NAEP framework reflected current research and 
practice, the experts judged the alignment between framework and test 
items to be “murky.’’ Items could not be mapped dearly onto the framework 
and, in practice, the items often failed to capture the intent behind the 
framework. Even with a sound framework, the translation into large-scale 
assessment items was problematic. 



Summary and Implications 



Literacy researchers bring a deliberate subject-matter focus to bear on ques- 
tions of standards and assessment. For the most part, they look more deeply 
at literacy than either policy or measurement researchers by examining spe- 
cific aspects of literacy instruction (e.g., writing process, qualities of writ- 
ing, alignment of assessment with constructivist curriculum frameworks in 
literacy, specificity of state standards) and by situating much of their work in 
classrooms or in direct interactions with teachers. The studies in this section 
suggest that instructional change in language arts does occur with reform, 
but that it is mediated by teachers’ beliefs, knowledge, and their sense of 
accountability pressure. In studies which integrate professional develop- 
ment with assessment reform, results are most positive both in terms of 
teachers’ learning and attitudes toward change, and in terms of usable 
assessment information. Reform without this support seems to produce sur- 
face-level change and questionable assessment practices. At the same time, 
the work on standards and implementation of new assessments suggests that 
the translation from literacy research to standards, and from standards to 
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assessment, is not straightforward. Overall, an emerging theme is one of ten- 
sion between the need for accountability and specificity on the one hand, 
and teacher decision-making and flexibility in interpretation on the other. 



Conclusions and Future Directions 



Throughout this report we have attended to both the nature of policy- 
oriented research on literacy standards and assessments and the impact of 
standards and assessment on literacy practice and student learning. Our con- 
clusions address both of these issues. 

With regard to the nature of policy-oriented research, it is clear that the 
research differs with regard to questions, methods, and audience as a func- 
tion of the perspective from which it arises. In general, we see policy 
researchers concerned with broad reforms involving standards, assessments, 
reorganization, governance, and the like; literacy is simply one of the sub- 
jects, and standards and assessment are two of the “tools” or levers of reform 
they study. Policy researchers’ questions focus on the system, and their data 
are gathered through teachers’ reported and actual practices. For the most 
part, we found few policy researchers distinguishing among subject areas 
within policy or spending extended time in literacy classrooms. Measure- 
ment researchers, as we might expect, are most interested in the assessment 
components of reform and are particularly concerned with validity issues 
and with the psychometric qualities of new assessments that are needed for 
accountability and policy purposes. For the most part, they rely on statistical 
analysis and, to some extent, self-reports, interviews, and artifacts to address 
their questions. Literacy researchers generally ask questions about instruc- 
tion and learning in relation to research and theory. Do new standards and 
assessments result in better reading and writing instruction? Do they 
advance teacher understanding? Are the reforms consistent with sound 
research and theory on literacy learning? Just as literacy is the vehicle for 
many policy studies, policy is the vehicle for many literacy studies. Literacy 
researchers typically look closely at actual classroom practices, teachers’ 
understanding, and artifacts of students literacy learning, and they work 
more directly with teachers than either policy or measurement researchers. 
For the most part, their new assessments are not subjected to the rigor of 
measurement researchers’ criteria, and the policy contexts for their work 
are not considered in a systemic way. 

The picture that emerges is that of a trade-off between general and in-depth 
information. Studies that address general questions provide information that 
is useful for understanding the larger issues of systemic reform (e.g., restruc- 
turing, governance, standards, and assessment) and the contexts in which 
these reforms are implemented. In contrast, research that addresses ques- 
tions about classroom practice in relation to specific subject matter provides 
insight into what happens at the individual teacher and student levels. It 
attends to teacher understanding and practice, and student learning, often 
without specific attention to the policy environment in which change is 
enacted. As policy-oriented research grows and matures, we see a greater 
need to attend to both the macro- and microview of reform, practice, and 
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learning, and we suspect there will be more cross-over among the studies 
representing policy, measurement, and literacy perspectives. 

With respect to the influence of policy, it is clear that literacy standards and 
assessment do have an influence on teachers’ beliefs and practice. However, 
the influence is not always in the expected or desired direction.The effect is 
mediated by numerous factors: teachers’ knowledge, beliefs, and existing 
practices; the economic, social, philosophical, and political conditions of 
the school or district; the stakes attached to the policy; and the quality of the 
support and lines of communication provided to teachers and administra- 
tors. It is equally clear that policy alone is not sufficient to promote change; 
simply implementing new assessments or creating new standards does not 
ensure improved teaching or learning. What is less clear, however, is what it 
would take to promote change in the desired direction or to ensure 
improved teaching and learning. To be sure, discipline-specific professional 
development is implicated in many studies, but we need to know more 
about professional development processes and the quality of those pro- 
cesses. How, for example, do teachers and districts learn about new literacy 
standards and assessments? How are districts and teachers supported to 
understand the theory and research that underlie new assessments or new 
content standards? How do these experiences shape teachers’ understand- 
ing and practices? What are effective models for professional development? 
These are not only important questions for educators; they are critical to 
policymakers who are being asked to support professional development as 
part of reform (Elliott, 1996; Hart, 1996). 

Among the factors mediating the effects on literacy teaching and learning, 
the research suggests that more specificity in standards and assessment pro- 
motes changes in the desired direction. A caution we would offer in this 
regard, however, is that many of the deeper levels of change in teacher 
beliefs and practices associated with literacy learning do not lend them- 
selves to simple directives. There is a very fine line between offering suffi- 
cient guidance for teachers and districts to undertake substantive change, 
and being prescriptive in ways that work against teacher learning, decision- 
making, and flexibility. Similarly, the influence of standards and assessments 
is likely mediated by teachers’ and administrators’ stance toward policy. Do 
they see policy as a means for monitoring, controlling, or helping educators 
do their work? Why, for example, would teachers bring impoverished under- 
standings of assessments and standards to policy work in their districts, yet 
demonstrate deep understanding in their classrooms? How do the messages 
teachers receive about standards and assessments fit or conflict with other 
policies in their environments such as mandated curriculum and materials, 
alternative teacher certification programs, site-based decision making, and 
the like? 

Finally, there is still uncertainty about the quality of the new literacy assess- 
ments and standards; they must stand the test of scrutiny of policymakers 
outside education, as well as educators themselves. If the tools themselves 
are problematic, political credibility and deeper-order change are highly 
improbable. That said, we also suggest that although there is currently a 
move away from the more elaborate forms of performance assessments (i.e. 
California, Arizona, Kentucky), these decisions are rarely based on psycho- 
metric qualities alone. Policy, resources, and politics weigh heavily in deci- 
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sions about the feasibility and desirability of new assessments and new 
standards. 

We conclude from our synthesis that there is a pressing need to conduct 
research on policy issues, such as standards and assessments, with specific 
attention to subject matter. There has been an almost tacit assumption 
among policy and measurement researchers that whatever holds true for 
one subject will likely be the case for others. When it is possible to look 
across subject areas such as mathematics and literacy, the analyses are rarely 
done. However, attention to depth of understanding of literacy processes, 
learning, and instruction is even more important. At the heart of this issue 
are questions about what it means to read and write with understanding; 
what teaching for understanding looks like in different classrooms; and what 
constitutes the domain of the English language arts curriculum. Future 
research must look both across and within the subject matter of literacy; this 
requires subject-matter expertise, and it requires more than self-report data. 

Issues of student achievement will need to be confronted as well. Few stud- 
ies we reviewed included direct measures of student learning. In one sense, 
this is understandable since most reform is fairly new and change is a long- 
term process. Nevertheless, pressure is mounting on educators to show 
results in terms of achievement. Future researchers will need to address the 
challenge, finding meaningful ways to document student achievement while 
documenting formative measures of progress such as parents’ understanding 
of instructional goals, teachers’ priorities and their practice, teacher under- 
standing, and surface-level changes in materials and activities. 

As we write this report in the late 1990s, there continues to be a ground- 
swell of new policies related to literacy standards and assessment, and there 
is new interest in policies related to instructional strategies .Whether we like 
it or not, literacy researchers have been drawn into policy. At worst we will 
be recipients of policy; at best we will be informers of policy. In our opin- 
ion, the best way to influence policy and teacher development is for policy, 
measurement, and literacy researchers to work together and to communi- 
cate the findings of their collaborative work in a wide range of journals and 
reports, and through participation in state and national councils. Literacy 
researchers need to become knowledgeable about policy research and 
about the policy contexts in which their research is conducted. At the same 
time, we must reach out to policy researchers and measurement research- 
ers, bringing to their work a deep understanding of the subject matter of lit- 
eracy and the pedgogical content knowledge needed to teach well. Without 
this collaborative commitment, policy will not reflect or inform meaningful 
changes in literacy teaching and learning; measurement will not encourage 
substantive instructional change or provide useful assessment information to 
literacy educators; and literacy educators will not have a voice in policy and 
measurement arenas. With a collaborative research agenda and a wider audi- 
ence, we can improve the lives of children and teachers. 
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The Center for the Improvement of Early Reading Achievement (CIERA) is 
the national center for research on early reading and represents a consor- 
tium of educators in five universities (University of Michigan, University of 
Virginia, and Michigan State University with University of Southern Califor- 
nia and University of Minnesota), teacher educators, teachers, publishers of 
texts, tests, and technology, professional organizations, and schools and 
school districts across the United States. CIERA is supported under the Edu- 
cational Research and Development Centers Program, PR/Award Number 
R305R70004, as administered by the Office of Educational Research and 
Improvement, U.S. Department of Education. 

Mission. CIERA’s mission is to improve the reading achievement of Amer- 
ica’s children by generating and disseminating theoretical, empirical, and 
practical solutions to persistent problems in the learning and teaching of 
beginning reading. 



CIERA Research Model 



The model that underlies CIERA’s efforts acknowledges many influences on 
children’s reading acquisition. The multiple influences on children’s early 
reading acquisition can be represented in three successive layers, each yield- 
ing an area of inquiry of the CIERA scope of work. These three areas of 
inquiry each present a set of persistent problems in the learning and teach- 
ing of beginning reading: 

CIERA INQUIRY 1 Characteristics of readers and texts and their reiationship to earty 

Readers and Texts reading achievement. What are the characteristics of readers and texts 

that have the greatest influence on early success in reading? How can chil- 
dren’s existing knowledge and classroom environments enhance the factors 
that make for success? 

Home and schoo l effects on earty reading achievment. How do the 
contexts of homes, communities, classrooms, and schools support high lev- 
els of reading achievement among primary-level children? How can these 
contexts be enhanced to ensure high levels of reading achievement for all 
children? 

Policy and professional effects on earty reading achievement. How 
can new teachers be initiated into the profession and experienced teachers 
be provided with the knowledge and dispositions to teach young children to 
read well? How do policies at all levels support or detract from providing all 
children with access to high levels of reading instruction? 
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Valencia and Wlxson have gathered scholarship from three disparate sources 
to produce the first integrated review of policy-related research on literacy 
education: (a) policy analyses that examine policies about literacy in the 
framework of systemic reform; (b) measurement and evaluation studies, 
conducted by psychometricians, that examine the assessments mandated by 
policies; and (c) studies by literacy researchers that attend to policies and 
literacy-specific content. Each of these literatures has a unique set of ques- 
tions, frameworks, methodologies, and audiences. Together, they provide a 
comprehensive perspective on policies and literacy education. 

The first studies— research from the policy perspective— show that policy 
tools such as curriculum frameworks and assessments influence practice. 
However, because studies from the policy perspective often fail to distinguish 
between content such as primary and middle grade reading, findings are 
limited. Typically, these studies conclude that influences are complex, varying 
as a function of the administrators and teachers who work with the tools. 

Studies from a measurement perspective also fail to distinguish between tasks 
and levels of literacy learning. As a consequence, rather global conclusions 
such as the following result: (a) new assessments do influence teachers’ 
practices; (b) accountability creates stress for teachers, which may influence 
the tools’ implementation; and (c) results have been inconclusive as to the 
ability of new assessments to provide reliable, valid data for accountability. 

The third group of studies, by literacy researchers, examines policies in 
relation to specific levels of literacy and in specific classroom contexts. 

These studies show that policies or mandated assessments can influence 
instructional practices and children’s learning differently as a function of 
developmental level and the literacy dimension. For example, when man- 
dated assessments emphasize lower-level literacy skills, children’s reading of 
challenging literature or writing in a variety of genres can get short shrift in a 
classroom. 

The best way to influence policy, instruction, and children’s learning, say 
Valencia and Wixson, is for policy, measurement, and literacy researchers to 
collaborate in conducting and reporting research. Each group needs to learn 
more about the others’ work in order to affect real change in literacy practice. 
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